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Abstract 

We consider the problem of ranking a set of OT con- 
straints in a manner consistent with data. (1) We 
speed up Tesar and Smolensky's RCD algorithm to 
be linear on the number of constraints. This finds a 
ranking so each attested form Xi beats or ties a par- 
ticular competitor y^. (2) We also generalize RCD 
so each Xi beats or ties all possible competitors. 

Alas, neither ranking as in (2) nor even generation 
has any polynomial algorithm unless P = NP — i.e., 
one cannot improve qualitatively upon brute force: 
(3) Merely checking that a single (given) ranking is 
consistent with given forms is coNP-complete if the 
surface forms are fully observed and A2-complete if 
not. Indeed, OT generation is OptP-complete. (4) 
As for ranking, determining whether any consistent 
ranking exists is coNP-hard (but in A2) if the forms 
are fully observed, and Sj-complete if not. 

Finally, we show (5) generation and ranking are 
easier in derivational theories: P, and NP-complete. 

1 Introduction 

Optimahty Theory (OT) is a grammatical 
paradigm that was introduced by Prince and 
Smolensky ( 1993| ) and suggests various compu- 
tational questions, including learnability. 

Following Gold (|96|) we might ask: Is the 
language class {L(Q) : C/ is an OT grammar} 
learnable in the limit? That is, is there a learn- 
ing algorithm that will converge on any OT- 
describable language L{Q) if presented with an 
enumeration of its grammatical forms? 

In this paper we consider an orthogonal ques- 
tion that has been extensively investigated by 
Tesar and Smolensky ( |1996| ), henceforth T&S. 
Rather than asking whether a learner can even- 
tually find an OT grammar compatible with an 
unbounded set of positive data, we ask: How 
efficiently can it find a grammar (if one exists) 
compatible with a finite set of positive data? 

Sections 3-5 present successively more realis- 
tic versions of the problem (sketched in the ab- 
stract). The easiest version turns out to be eas- 

* Many thanks go to Lane and Edith Hemaspaandra 
for references to the complexity literature, and to Bruce 
Tesar for comments on an earlier draft. 



ier than previously known. The harder versions 
turn out to be harder than previously known. 

2 Formalism 

An OT grammar Q consists of three elements, 
any or all of which may need to be learned: 

• a set C of underlying forms produced by 
a lexicon or morphology, 

• a function Gen that maps any underlying 
form to a set of candidates, and 

• a vector C = (Ci, C2, . . . C„) of con- 
straints, each of which is a function from 
candidates to the natural numbers N. 



Ci is said to rank higher than (or outrank) 
Cj in C iff z < j. We say x satisfies Ci if 
Ci{x) = 0, else x violates Cj. 

The grammar Q defines a relation that 
maps each u G £ to the candidate(s) 

X G Gen(n) for which the vector C{x) '= 
(Ci(a;), C2(x), . . . Cn{x)) is lexicographically 
minimal. Such candidates are called optimal. 

One might then say that the grammatical 
forms are the pairs (n, x) of this relation. But 
for simplicity of notation and without loss of 
generality, we will suppose that the candidates 
X are rich enough that u can always be recov- 
ered from Then u is redundant and we may 
simply take the candidate x to be the grammat- 
ical form. Now the language L{Q) is simply the 
image of C under Q. We will write Ux for the 
underlying form, if any, such that x G GQr\{ux)- 

An attested form of the language is a candi- 
date X that the learner knows to be grammatical 
(i.e., X G L{Q)). 7/ is a competitor of x if they 
are both in the same candidate set: Ux = Uy. If 
x,y are competitors with C{y) < C{x), we say 
that y beats x (and then x is not optimal). 

^This is necessary in any case if Cj(x) is to depend 
on (all of) the underlying form u. In general, we expect 
that each candidate x £ Gen (it) encodes an alignment of 
the underlying form u with some possible surface form 
s, and Cj{x) evaluates this pair on some criterion. 
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An ordinary learner does not have access to 
attested forms, since observing that x G L{Q) 
would mean observing an utterance's entire 
prosodic structure and underlying form, which 
ordinarily are not vocalized. An attested set 
of the language is a set X such that the learner 
knows that some x € X is grammatical (but not 
necessarily which x). The idea is that a set is at- 
tested if it contains all possible candidates that 
are consistent with something a learner heard .0 
An attested surface set — the case considered 
in this paper — is an attested set all of whose el- 
ements are competitors; i.e., the learner is sure 
of the underlying form but not the surface form. 

Some computational treatments of OT place 
restrictions on the grammars that will be con- 
sidered. The finite-sta te assumptions (plli 



5on, 1994; Eisner, 1997a; Frank and Satta, 1998 



Karttunen, 1998; Wareham, 1998) are that 

• candidates and underlying forms are repre- 
sented as strings over some alphabet; 

• Gen is a regular relation;^ 

• each Cj can be implemented as a 
weighted deterministic finite-state automa- 
ton (WDFA) (i.e., Cj{x) is the total weight 
of the path accepting x in the WDFA); 

• C and any attested sets are regular. 



The bounded- violations assumption ( Frank 
and Satta, 1998| ; Karttunen, 199^ ) is that the 
value of Cj{x) cannot increase with but is 
bounded above by some k. 

In this paper, we do not always impose these 
additional restrictions. However, when demon- 
strating that problems are hard, we usually 
adopt both restrictions to show that the prob- 
lems are hard even for the restricted case. 



^This is of course a simplification. Attested sets corre- 
sponding to laugh and laughed can represent the learner's 
uncertainty about the respective underlying forms, but 
not the knowledge that the underlying forms are related. 
In this case, we can solve the problem by packaging the 
entire morphological paradigm of laugh as a single candi- 
date, whose attested set is constrained by the two surface 
observations and by the requirement of a shared under- 
lying stem. (A fc-member paradigm may be encoded in 
a form suitable to a finite-state system by interleaving 
symbols from 2k aligned tapes that describe the k under- 
lying and k surface forms.) Alas, this scheme only works 
within disjoint finite paradigms: while it captures the 
shared underlying stem of laugh and laughed, it ignores 
the shared underlying suffix of laughed and frowned. 

^Ellison (1994) makes only the weaker assumption 
that Gen(u) is a regular set for each u. 



Throughout this paper, we follow T&S in 
supposing that the learner already knows the 
correct set of constraints C = {Ci, C2, . . . C„}, 
but must learn their order C = (Ci, C2, . . . C„), 
known as a ranking of C. The assumption fol- 
lows from the OT philosophy that C is univer- 
sal across languages, and only the order of con- 
straints differs. The algorithms for learning a 
ranking, however, are designed to be general for 
any C, so they take C as an input 

3 RCD as Topological Sort 

T&S investigate the problem of ranking a 
constraint set C given a set of attested 
forms xi, . . . Xm and corresponding competitors 
Hi, ■ ■ .ym- The problem is to determine a rank- 
ing C such that for each i, C{xi) < C{yi) lexi- 
cographically. Otherwise would be ungram- 
matical, as witnessed by y^. 

In this section we give a concise presentation 
and analysis of T&S's Recursive Constraint 
Demotion (RCD) algorithm for this problem. 
Our presentation exposes RCD's connection to 
topological sort, from which we borrow a simple 
bookkeeping trick that speeds it up. 

3.1 Compiling into Boolean Formulas 

The first half of the RCD algorithm extracts 
the relevant information from the {xj} and 
{Vi}, producing what T&S call mark-data pairs. 
We use a variant notation. For each con- 
straint C G C, we construct a negation-free, 
conjunctive-normal form (CNF) Boolean for- 
mula 0(C) whose literals are other constraints: 



A 



V 



i:C{x,)>C{yi) C':C'{x,)<C'{yi) 



*That is, these methods are not tailored (as others 
might be) to exploit the structure of some specific, pu- 
tatively universal C. Hence they require time at least 
linear on n = \C\, if only to read all the constraints. 
Given the variety of cross-linguistic constraints in the 
literature, one m ust w orrv: is n huge? Most authors 
following Ellison (1994) allow as constraints all the reg- 
ular languages over some alphabet E; then n > s^d^l"!) 
distinct constraints can be described by DFAs of size s, 
where E (or s) must be large to accommodate all fea- 
tures and prosodic constituents. One solution: let each 
constraint constrain only a few symbols in E (e.g., bound 
the nu mber of r ion-def ault transitions per DFA). Indeed, 
Eisner (1997a; 1997b) proposes that C is the union of 
two "primitive" constraint families. If each primitive 
constraint may mention at most t of T autosegmental 
tiers, then n = O(T'), which is manageable for small t. 
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The interpretation of the hteral C in (/>(C) 
is that C outranks C . It is not hard to see 
that a constraint ranking is a vahd solution iff 
it satisfies (/>(C) for every C. For example, if 
4>{d) = (a V 5 V c) A (6 V e V /), this means that 
d must be outranked by either a, 6 or c (else xi 
is ungrammatical) and also by either 6, e or / 
(else X2 is ungrammatical). 

How expensive is this compilation step? Ob- 
serve that the inner term \l c'-c'{xi)<C'(yi) ^' 
independent of C, so it only needs to be com- 
puted and stored once. Call this term We 
first construct all m of the disjunctive clauses 
L'j, requiring time and storage 0{mn). Then 
we construct each of the n formulas </'(C) = 
f\vc{xi)>c{yt) ^ °^ pointers to up to m 

clauses, again taking time and storage 0{mn). 

The computation time is 0{mn) for the steps 
we have already considered, but we must add 
0{mnE), where E is the cost of precomputing 
each C{xi) or C{yi) and may depend on prop- 
erties of the constraints and input forms. 

We write M(= 0{mn)) for the exact stor- 
age cost of the formulas, i.e., M = \Di \ + 
T^c\'t'i^)\ 'where 10(C) | counts only the num- 
ber of conjuncts. 

3.2 Finding a Constraint Ranking 

The problem is now to find a constraint ranking 
that satisfies (/>(C) for every C G C. Consider 
the special case where each <?!>(C) is a simple 
conjunction of literals — that is, (\/i)|-Dj| = 1. 
This is precisely the problem of topologically 
sorting a directed graph with n vertices and 
\<f){C)\ = M/2 edges. The vertex set is C, 
and </>(C) lists the parents of vertex C, which 
must all be enumerated before C. 

Topological s ort has two well-know n 0{M + 
n) algorithms ( Cormen et al., 199C| ). One is 
based on depth-first search. Here we will focus 
on the other, which is: Repeatedly find a vertex 
with no parents, enumerate it, and remove it 
and its outgoing edges from the graph. 

The second half of T&S's ROD algorithm is 
simply the obvious generalization of this topo- 
logical sort method (to directed hypergraphs, 
in fact, formally speaking). We describe it as a 
function Rcd(C, (p) that returns a ranking C: 

1. If C = 0, return (). Otherwise: 

2. Identify a Ci G C such that (/'(Ci) is empty. 
(Ci is surface-true, or "undominated." ) 



3. If there is no such constraint, then fail: no 
ranking can be consistent with the data. 

4. Else, for each C G C, destructively remove 
from <^(C) any disjunctive clause Di that 
mentions Ci. 

5. Now recursively compute and return C = 
(Ci,Rcd(C-{Ci}, (/.)). 

Correctness of Rcd(C, (p) is straightforward, 
by induction on n = \C\. The base case n = 
is trivial. For n > 0: 0(Ci) is empty and 
therefore satisfied. (^{C) is also satisfied for all 
other C: any clauses containing Ci are satisfied 
because Ci outranks C, and any other clauses 
are preserved in the recursive call and therefore 
satisfied by the inductive hypothesis. 

We must also show completeness of 
Rcd(C,(/)): if there exists at least one cor- 
rect answer then the function must not fail. 
Again we use induction on n. The base case 
n = is trivial. For n > 0: Observe that (t){Bi) 
is satisfied in B, by correctness of B. Since Bi 
is not outranked by anything, this implies that 
(j){Bi) is empty, so Rcd has at least one choice 
for Ci and does not fail. It is easy to see that B 
with Ci removed would be a correct answer for 
the recursive call, so the inductive hypothesis 
guarantees that that call does not fail either. 

3.3 More Efficient Bookkeeping 

T&S (p. 61) analyze the Rcd function as tak- 
ing time O(mn^); in fact their analysis shows 
more precisely 0{Mn). We now point out that 
careful bookkeeping can make it operate in time 
0{M + n), which is at worst 0{mn) provided 
n > 0. This means that the whole RCD al- 
gorithm can be implemented in time 0{mnE), 
i.e., it is bounded by the cost of applying all the 
constraints to all the forms. 

First consider the special case discussed 
above, topological sort. In linear-time topolog- 
ical sort, each vertex maintains a list of its chil- 
dren and a count of its parents, and the program 
maintains a list of vertices whose parent count 
has become 0. The algorithm then requires only 
0(1) time to find and remove each vertex, and 
0(1) time to remove each edge, for a total time 
of 0{M + n) plus 0{M + n) for initialization. 

We can organize RCD similarly. We change 
our representations (not affecting the compi- 
lation time in §3.1). Constraint C need not 
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store (j){C). Rather, C should maintain a hst 
of pointers to clauses Di in which it appears as 
a disjunct (cf. "a list of its children" ) as well as 
the integer |(/'(C)| (cf. "a count of its parents"). 
The program should maintain a list of "undomi- 
nated" constraints for which |0(C)| has become 
0. Finally, each clause Di should maintain a list 
of constraints C such that Di appears in (j){C). 

Step 2 of the algorithm is now trivial: remove 
the head Ci of the list of undominated con- 
straints. For step 4, iterate over the stored list 
of clauses Di that mention Ci. Eliminate each 
such Di as follows: iterate over the stored list 
of constraints C whose 0(C) includes Di (and 
then reset that list to empty) , and for each such 
C, decrement |(/>(C)|, adding C to the undomi- 
nated list if |</>(C)| becomes 0. 

The storage cost is still 0{AI+n). In particu- 
lar, 4>{C) is now implicitly stored as |0(C)| back- 
pointers from its clauses Di, and Di is now im- 
plicitly stored as \Di\ backpointers from its dis- 
juncts (e.g., Ci). Since RCD removes each con- 
straint and considers each backpointer exactly 
once, in 0(1) time, its runtime is 0{M + n). 

In short, this simple bookkeeping trick elim- 
inates RCD's quadratic dependence on n, the 
number of constraints to rank. As already 
mentioned, the total runtime is now domi- 
nated by 0{mnE), the preprocessing cost of 
applying all the constraints to all the input 
forms. Under the finite-state assumption, this 
can be be more tightly bounded as 0(n • 
total size of input forms) = 0{n ■ \xi\ + |yj|), 
since the cost of running a form through a 
WDFA is proportional to the former's length. 

3.4 Alternative Algorithms 

T&S also propose an alternative to RCD called 
Constraint Demotion (CD), which is perhaps 
better-known. (They focus primarily on it, and 
Kager's textbook ( [1999| ) devotes a chapter to 
it.) A disjunctive clause Di (compiled as in 
§ |3.lD is processed roughly as follows: for each 
C such that Di is an unsatisfied clause of 4>{C), 
greedily satisfy it by demoting C as little as pos- 
sible. CD repeatedly processes -Di, . . . Dm until 
all clauses in all formulas are satisfied. 

CD can be efficiently implemented so that 
each pass through all clauses takes time propor- 
tional to M. But it is easy to construct datasets 
that require n + \ passes. So the ranking step 
can take time 0(Mn), which contrasts unfavor- 



ably with the 0[M + n) time for Rod. 

CD does have the nice property (unlike RCD) 
that it maintains a constraint ranking at all 
times. An "online" (memoryless) version of CD 
is simply to generate, process, and discard each 
clause Di upon arrival of the new data pair 
Xi^Ui] this converges, given sufficient data. But 
suppose one wishes to maintain a ranking that is 
consistent with all data seen so far. In this case, 
CD is slower than RCD. Modifying a previously 
correct ranking to remain correct given the new 
clause Di requires at least one pass through all 
clauses Di, . . . Di (as slow as RCD) and up to 
n+1 passes (as slow as running CD on all clauses 
from scratch, ignoring the previous ranking). 



4 Considering All Competitors 

The algorithms of the previous section only en- 
sure that each attested form Xi is at least as har- 
monic as a given competitor y^: C{xi) < C{yi). 
But for Xi to be grammatical, it must be at least 
as harmonic as all competitors. We would like 
a method that ensures this. Such a method will 
rank a constraint set C given only a set of at- 
tested forms {xi, . . . Xm}- 

Like T&S, whose algorithm for this case is 
discussed in §^]^, here we (dangerously) as- 
sume we have an efficient computation of OT's 
production function Opt(C, u) (such as Ellison 
( |199^ , Tesar (|l996D , or Eisner (|1997a| )). This 
returns the subset of Gen(n) on which C(-) is 
lexicographically minimal, i.e., the set of gram- 
matical outputs for u. For the analysis, let P be 
a bound on the runtime of our Opt algorithm. 
We will discuss this runtime further in ^61 



4.1 Generalizing RCD 

We propose to solve this problem by running 
something like our earlier RCD algorithm, but 
considering all competitors at once. 

First, as a false start, let us try to construct 
the requirements (/>(C) in this case. Consider 
the contribution of a single Xi to a particular 
(piC). Xi demands that for any competitor y 
such that C{xi) > C{y), C must be outranked 
by some C such that C'{xi) < C'{y). One set 
of competitors y might all add the same clause 
(a V 6 V c) to (l){C); another set might add a 
different clause (5 V d V e). 

The trouble here is that 4'{C) may become 
intractably large. This will happen if the con- 
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straints are roughly orthogonal to one another. 
For example, suppose the candidates are bit 
strings of length n, and for each k, there ex- 
ists a constraint Off^ preferring the kth bit to 
be zero J3 If = 1000 • • • 0, then (/>(Offi) con- 
tains all 2"~^ possible clauses: for example, it 
contains (OFF2 V OFF4 V OFF5) by virtue of the 
competitor y = 0101100000 •• •. Of course, the 
conjunction of all these clauses can be drasti- 
cally simplified in this case, but not in general. 

Therefore, we will skip the step of construct- 
ing formulas (j){C). Rather, we will run some- 
thing like Rod directly: greedily select a con- 
straint Ci that does not eliminate any of the 
attested forms Xi (but that may eliminate some 
of its competitors), similarly select C2, etc. 

In our new function RcdAll(C, -B, {xj}), the 
input includes a partial hierarchy B listing the 
constraints chosen at previous steps in the re- 
cursion. (On a non-recursive call, B = ().) 

1. If C = 0, return (). Otherwise: 

2. By trying all constraints, find a constraint 
Ci such that (Vz)xi G Opt((^, Ci), u^,.) 

3. If there is no such constraint, then fail: no 
ranking can be consistent with the data. 

4. Else recursively compute and return C = 
(Ci, RcdAll(C - {Ci}, 0, Ci), {x,})) 

It is easy to see by induction on \C\ that 
RcdAll is correct: if it does not fail, it al- 
ways returns a ranking C such that each Xi is 
grammatical under the ranking {B,C). It is 
also complete, by the same argument we used 
for RCD: if there exists a correct ranking, then 
there is a choice of Ci for this call and there 
exists a correct ranking on the recursive call. 

The time complexity of RcdAll is O(mn^P). 
Preprocessing and compilation are no longer 
necessary (that work is handled by Opt). We 
note that if Opt is implemented by succes- 
sive winnowing of an appropriately represented 
candidate set, as is common in finite-state ap- 
proaches, then it is desirable to cache the sets 
returned by Opt at each call, for use on the re- 
cursive call. Then Opt{{B,Ci),UxJ need not 
be computed from scratch: it is simply the sub- 
set of OPT{B,Uxi) on which Ci(-) is minimal. 

^OFFk (x) simply extracts the kth bit of x. We will 
later denote it as C-,„j. . 



4.2 Alternative Algorithms 

T&S provide a different, rather attractive so- 
lution to this problem, which they call Error- 
Driven Constraint Demotion (EDCD). This is 
identical to the "online" CD algorithm of §3.4, 
except that for each attested form x that is 
presented to the learner, EDCD automatically 
chooses a competitor y G Opt(C, n^;), where C 
is the ranking at the time. 

If the supply of attested forms xi, . . . Xm is 
limited, as assumed in this paper, one may it- 
erate over them repeatedly, modifying C, until 
they are all optimal. When an attested form x is 
suboptimal, the algorithm takes time 0{nE) to 
compile x,y into a disjunctive clause and time 
0(n) to process that clause using CD.^ 

T&S show that the learner converges af- 
ter seeing at most O(n^) suboptimal attested 
forms, and hence after at most 0{'n?) passes 
through xi,...Xm- Hence the total time is 
0{n^E + m'nPP), where P is the time required 
by Opt. This is superficially worse than our 
RcdAll, which takes time 0{mn'^P), but re- 
ally about as good since P dominates (see §^). 

Mainly, RcdAll is simpler. §0 (note ||) also 
shows that RcdAll needs less information from 
each call to Opt; this improves the complexity 
class of the call, though not of the full algorithm. 

Algorithms that adjust constraint rankings 
or weights along a continuous scale include the 
Gradual Learning Algorithm ( Boersma, 1997 ), 
which resembles simulated annealing, and max- 
imum likelihood estimation ( [Johnson, 2000 ). 
These methods have the considerable advantage 
that they can deal with noise and free variation 
in the attested data. Both algorithms repeat 
until convergence, which makes it difficult to 
judge their efficiency except by experiment. 



5 Incompletely Observed Forms 

We now add a further wrinkle. Suppose the 
input to the learner specifies only C together 
with attested surface sets {^j}, as defined in 
§^, rather than attested forms. This version of 
the problem captures the learner's uncertainty 



® Instead of using CD on the new clause only, one may 
use RCD to find a ranking consistent with all clauses 
generated so far. This step takes worst-case time 0{n^) 
rather than 0{n) even with our improve d algo rithm, but 
may allow faster convergence. Tesar (1997) calls this 



version Multi-Recursive Constraint Demotion (MRCD). 
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about the fuh description of the surface mate- 
rial. As before, the goal is to rank C in a manner 
consistent with the input. 

With this wrinkle, even determining whether 
such a ranking exists turns out to be surpris- 
ingly harder. In §^ we will see that it is actually 
S2-complete. Here we only show it NP-hard, 
using a construction that suggests that the NP- 
hardness stems from the need to consider expo- 
nentially many rankings or surface forms. 

5.1 NP-Hardness Construction 

Given r S N, we will be considering finite-state 
OT grammars of the following form: 

• >c = H- 

• Gen(e) = S**, the set of all length-r strings 
over the alphabet S = {1,2, ... r}. (This 
set can be represented with a straight-line 
DFA of r -|- 1 states and arcs.) 

• C = {EARLYj : 1 < i < r}, where for any 
X S S*, the constraint EARLY j{x) counts 
the number of digits in x before the first 
occurrence of digit j, if any. For example, 
EARLY3 (2188353) = Early3(2188) = 4. 
(Each such constraint can be implemented 
by a WDFA of 2 states and 2r arcs.) 

EarlYj favors candidates in which j ap- 
pears early. The ranking (EARLY5, EARLYg, 
Earlyi,...) favors candidates of the form 
581 • • •; no other candidate can be grammatical. 

Given a directed graph G with r vertices iden- 
tified by the digits 1, 2, ... r. A path in G is 
a string of digits jij2j3 ■ ■ ■ jk such that G has 
edges from ji to j2, J2 to ja, ... and jk-i to 
jk- Such a string is called a Hamilton path 
if it contains each digit exactly once. It is an 
NP-complete problem to determine whether an 
arbitrary graph G has a Hamilton path. 

Suppose we let the attested surface set Xi be 
the set of length-r paths of G. This is a reg- 
ular set that can be represented in space pro- 
portional to r\G\, by intersecting the DFA for 
Gen(e) with a DFA that accepts all paths of 

Now {C,{Xi}) is an instance of the ranking 
problem whose size is 0{r\G\). We observe that 
any correct ranking algorithm determines if G 

^The latter DFA is isomorphic to G plus a start state. 
The states are 0,1,... r; there is an arc from j to j' 
(labeled with f) iff j = or G has an edge from j to j' . 



has a Hamilton path. Why? A ranking is a vec- 
tor C = (EarlYj^, . . . EarlYj^), where ji, ■ ■ ■ jr 
is a permutation of 1, . . . r. The optimal form 
under this ranking is in fact the string ji ■ ■ ■ jr . 
A string is consistent with Xi if it is a path 
of G, so the ranking C is consistent with Xi 
iff ji . . . jr is a Hamilton path of G. If such a 
ranking exists, the algorithm is bound to find it, 
and otherwise to return a failure code. Hence 
the ranking problem of this section is NP-hard. 

Further, if the Satisfiability Hypothesis (SH) 
holds ( [Stearns and Hunt III, 19901 ), Hamilton 
Path must take time 2^(l'^l\ a fortiori 2^^^'\ 
Then any ranking algorithm takes 2^^") (n = 
|C|). 

5.2 Discussion 

Since each ranking of the constraints EarlYj 
is trivial to test against Xi (by DFA intersec- 
tion), the NP-hardness of ranking them arises 
not from the difficulty of each test (though other 
constraint sets do have such hard tests! see ^ 
but from the 2" possible rankings. A brute-force 
check of exponentially many rankings takes time 
26(n)_ Thus, given SH, no ranking algorithm 
can consistently beat such a brute-force check. 

Note that our construction shows NP- 
hardness for even a restricted version of the 
ranking problem: finite-state grammars and fi- 
nite attested surface sets. The result holds up 
even if we also make the bounded- violations as- 
sumption (see §|2|): the violation count can stop 
at r, since EARLYj need only work correctly on 
strings of length r. We revise the construction, 
modifying the automaton for each EARLYj by 
intersection (more or less) with the straight-line 
automaton for S^. This preserves \C\ and Xi 
and blows up the ranker's input C by only 0(r). 

By way of mitigating this stronger result, we 
note that the construction in the previous para- 
graph bounds \Xi\ by r! and the number of vio- 
lations by r. These bounds (as well as \C\ = r) 
increase with the order r of the input graph. If 
the bounds were imposed by universal grammar, 
the construction would not be possible and NP- 
hardness might not hold. Unfortunately, any 
universal bounds on \Xi \ or \C\ would hardly be 
small enough to protect the ranking algorithm 
from having to solve huge instances of Hamilton 
path.^ As for bounded violations, the only real 

*We expect attested sets Xi to be very large — 
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reason for imposing this restriction is to ensure 
that the OT grammar defines a regular rela- 
tion ( [Frank and Satta, 19981; [Karttunen, 19981) . 



In recent work, Eisner ( pOOQ ) argues that the 
restriction is too severe for linguistic descrip- 
tion, and proposes a more general class of "di- 
rectional constraints" under which OT gram- 
mars remain regular.^ If this relaxed restric- 
tion is substituted for a universal bound on vio- 
lations, the ranking problem remains NP-hard, 
since each EarlYj is a directional constraint. 

A more promising "way out" would be to uni- 
versally restrict the size or structure of the au- 
tomaton that describes the attested set. The 
set used in our construction was quite artificial. 

However, in §|7| we will answer all these ob- 
jections: we will show the problem to be T,-^' 
complete, using finite-state constraints with at 
most 1 violation (which, however, will not in- 
teract as simply) and a natural attested set. 

5.3 Available Algorithms 

The NP-hardness result above suggests that ex- 
isting algorithms designed for this ranking prob- 
lem are either incorrect or intractable on certain 
cases. Again, this does not rule out efficient al- 
gorithms for variants of the problem — e.g., for 
a specific universal C — nor does it rule out algo- 
rithms that tend to perform well in the average 
case, or on small inputs, or on real data. 

T&S proposed an algorithm for this problem, 
RIP /CD, but left its efficiency and correctness 
for future research (p. 39); Tesar and Smolen- 
sky ( 2000| ) show that it is not guaranteed to 
succeed. Tesar (|1997D gives a related algorithm 
based on MRCD (see ^4.2| ), but which some- 
times requires iterating over all the candidates 
in an attested surface set; this might easily be 
intractable even when the set is finite. 

6 Complexity of OT Generation 

The ranking algorithms in §§0-4]^ relied on 



the existence of an algorithm to compute the in- 
dependently interesting "language production" 
function Opt(C, u), which maps underlying u 
to the set of optimal candidates in Gen(ii). 



especially in the more general case where they reflect un- 
certainty about the underlying form. That is why we de- 
scribe them compactly by DFAs. A universal constraint 
set C would also have to be very large (footnote ^ . 

^Allowing directional constraints would not change 
any of the classifications in this paper. 



In this section, we consider the computational 
complexity of some functions related to Opt:|^ 

• OptVal(C,m): returns min^^^^g^^^^ C(x). 
This is the violation vector shared by all 
the optimal candidates x € OPT(C,n). 

• OptValZ((7, u): returns "yes" iff the last 
component of the vector OptVal(C, u) is 
zero. This decision problem is interesting 
only because if it cannot be computed effi- 
ciently then neither can OptVal (or Opt). 

• Beatable((7, u, {ki, . . . kn)): returns "yes" 
iff OPTVAL(C,n) <{ki,...kn). 

• Best((7, u, (/ci, . . . fc„)): returns "yes" iff 
OptVal(C, u) = {ki,...kn). 

• Check((7,x): returns "yes" iff x € 
OPT(C,Ma;)- This checks whether an at- 
tested form is consistent with C. 

• CheckSSet(C,X): returns "yes" iff 
Check(C, x) for some x E X. This checks 
whether an attested surface set (namely X) 
is consistent with C. 

These problems place a lower bound on the diffi- 
culty of OT generation, since an algorithm that 
found a reasonable representation of Opt(C, u) 
(e.g., a DFA) could solve them immediately, 
and an algorithm that found an exemplar x G 
Opt(C, u) could solve ah but CheckSSet im- 
mediately. §1^ will relate them to OT learning. 

6.1 Past Results 

Under finite-state assumptions, Ellison ( |1994| ) 
showed that for any fixed C, a representa- 
tion of Opt(C, u) could be generated in time 
0(|u| log |n|), making all the above problems 
tractable. However, Eisner ( |l997a ) showed gen- 
eration to be intractable when C was not fixed, 
but rather considered to be part of the input — 
as when generation is called by an algorithm 
like RcdAll that learns rankings. Specifically, 
Eisner showed that OptValZ is NP-hard. Sim- 



ilarly, Wareham ( 1998 , theorem 4.6.4) showed 
that a version of Beatable is NP-hard.[^ (We 
will obtain more precise classifications below.) 



^°A11 these functions take an additional argument Gen, 
which we suppress for readability. 

Wareham also gave hardness results for versions of 
Beatable where some parameters are bounded or fixed. 
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To put this another way, the worst-case com- 
plexity of generation problems is something like 
0{\u\ log \u\) times a term exponential in \C\. 

Thus there are some grammars for which gen- 
eration is very difficult by any algorithm. So 
when testing exponentially many rankings (§|5|), 
a learner may need to spend exponential time 
testing an individual ranking. 

We offer an intuition as to why generation can 
be so hard. In successive-winnowing algorithms 
like that of ( Eisner, 1997a| ), the candidate set 
begins as a large simple set such as S*, and is 
filtered through successive constraints to end up 
(typically) as a small simple set such as the sin- 
gleton {xi}. Both these sets can be represented 
and manipulated as small DFAs. The trouble is 
that intermediate candidate sets may be com- 
plex and require exponentially large DFAs to 
represent. (Recall that the intersection of DFAs 
can grow as the product of their sizes.) 



For example, Eisner's (1997a) NP-hardness 
construction led to such an intermediate can- 
didate set, consisting of all permutations of r 
digits. Such a set arises simply from a hierar- 
chy such as (Projecti, . . . Project^, Short), 
where PROJECTj(x) = provided that j ap- 
pears (at least once) in x, and Short(x) = 
(Adding a bottom-ranked constraint that 
prefers x to encode a path in a graph G forces 
Opt to search for a Hamilton path in G, which 
demonstrates NP-hardness of OptValZ.) 

6.2 Relevant Complexity Classes 

Perhaps the reader recalls that P C NPncoNP C 



NP U coNP GDP CAl = P^P C 



NP^P. If 



not, we will review these classes as they arise. P| 
These are classes of decision problems, i.e., 
functions taking values in {yes, no}. Hardness 
and completness for such classes are defined via 
many-one (Karp) reductions: g is at least as 
hard as / iff (Vx)/(x) = g{T{x)) for some func- 
tion T{x) computable in polynomial time.P^ 

In contrast, OptP is a class of integer- valued 
functions, introduced by Krentel (|l98j ). Recall 
that NP is the class of decision problems solv- 
able in polytime by a nondeterministic Turing 
machine: each control branch of the machine 



^^Problems in all but P are widely suspected to require 
exponential time — which suffices by brute-force search. 
(Smaller classes allow "more cleanly parallel" search.) 

^"^g is X-hard if it is at least as hard as all f € X, 
and X-complete if also g £ X. 



checks a different possibility and gives a yes/no 
answer, and the machine returns the disjunction 
of the answers. For coNP, the machine returns 
the conjunction. For OptP, each branch writes 
a binary integer > 0, and the machine returns 
the minimum (or maximum) of these answers. 

A canonical example (analogous to OptVal) 
is the Traveling Salesperson problem — finding 
the minimum cost TspVal(G) of all tours of an 
integer-weighted graph G. It is OptP-complete 
in the sense that all functions / in OptP can 
be metrically reduced to it ([Krentel, 1988 , 
p. 493). A metric reduction solves an instance of 
/ by transforming it to an instance of g and then 
transforming the integer result of g: (yx)f{x) = 
T2{x, g{Ti{x))) for some polytime-computable 
functions Ti : S* ^ S* and Ta : S* x N ^ N. 

Krentel showed that OptP-complete prob- 
lems yield complete problems for decision 
classes under broad conditions. The question 
TspVal(G) < /c is of course the classical TSP 
decision problem, which is NP-complete. (It is 
analogous to Beatable.) The reverse question 
TspVal(G) > k (which is related to Check) is 
coNP-complete. The question TspVal(G) = k 
(analogous to Best) is therefore in the class 
pp = {Li n La : Li £ NP and L2 £ coNP} 
( Papadimitriou and Yannakakis, 1982| ), and it 
is complete for that class. Finally, suppose 
we wish to ask whether the optimal tour is 
unique. (Like OptValZ and CheckSSet, this 
asks about a complex property of the optimum.) 



Papadimitriou (1984) first showed this question 
to be complete for = P'^^, the class of 
languages decidable in polytime by determin- 
istic Turing machines that have unlimited ac- 
cess to an oracle that can answer NP questions 
in unit time. (Such a machine can certainly 
decide uniqueness: It can compute the integer 
TspVal(G) by binary search, asking the oracle 
for various k whether or not TspVal(G) < k, 
and then ask it a final NP question: do there 
exist two distinct tours with cost TspVal(G)?) 

6.3 New Complexity Results 

It is quite easy to show analogous results for 
OT generation. Our main tool will be one of 
Krentel's ( 1988| ) OptP-complete problems: Min- 
imum Satisfying Assignment. If (/) is a CNF 
boolean formula on n variables, then Msa((/)) 
returns the lexicographically minimal bitstring 
bib2 - ■ - bn that represents a satisfying assign- 
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ment for (f), or 1" if no such bitstring exists.P^ 

We consider only problems where we can 
compute Cj{x), or determine whether x £ 
Gen(u), in polytime. We further assume that 
Gen produces only candidates of length polyno- 
mial in the size of the problem input — or more 
weakly, that our functions need not produce cor- 
rect answers unless at least one optimal candi- 
date is so bounded. 

Our hardness results (except as noted) apply 
even to OT grammars with the finite-state and 
bounded- violations assumptions (§§). In fact, 
we will assume without further loss of general 
ity dElhson, 1994^ [Frank and Satta, 1998| ; |Kart 



let C 



Cn' , C- 



C- 



Then 



tunen, 1998 ) that constraints are {0, l}-valued. 



hence representable by unweighted DFAs. 

Notation: We may assume that all formulas 
(j) use variables from a set {ui, f2, . . . fod^l)}- 
Let be the maximum i such that Vi ap- 
pears in (j). We define the constraint C,^ to map 
strings of at least ^(0) bits to {0,1}, defining 
C,ji(6i52 • • •) = iff is true when the variables 
Vi in (f) are instantiated respectively to values bi. 

If we do not make the finite-state assump- 
tions, then any can be represented trivially 
in size \4>\. But under these assumptions, we 
must represent as a DFA that accepts just 
those bitstrings that satisfy (p. While this is al- 
ways possible (operators A, V, in (/> correspond 
to DFA operations), we necessarily take care in 
this case to use only whose DFAs are polyno- 
mial in In particular, if </> is a disjunction of 



(possibly negated) literals, such as 62 V 63 V -167 



then a DFA of £{(!)) + 2 states suffices. 

We begin by showing that OptVal((7, u) is 
OptP-complete. It is obvious under our restric- 
tions that it is in the class OptP — indeed it is a 
perfect example. Each nondeterministic branch 
of the machine considers some string x of length 
< p(|ii|), simply writing the bitstring C{x) if 
X G Gen(u) and 1" otherwise. 

To show OptP-hardness, we metrically reduce 
Msa((/)) to OptVal, where (j) = AI^i A is in 
CNF. Let r = £{(()), and put C = {e} and 
Gen(e) = {0,1}^ Also put D[ = Di\J {vi A 
. . . A fr-), SO that V satisfies each C^)'. Now 



Msa((/)) = the last r bits of OptVal(C, e).0 

Because OptVal is OptP-complete, Krentel's 
theorem 3.1 says it is complete for FP'^^, the 
set of functions computable in polynomial time 
using an oracle for NP. This is the function class 
corresponding to the decision class P'^^ = Ag. 

Next we show that Beatable((7, u, k) is 
NP-complete. It is obviously in NP. To 
show NP-hardness (and power index 1, so 
that SH (§|1|) imphes runtime 2^('^'^^ '"P"*)), 
again put = Ai^i-^ii ^ = ^(0)) and 
Gen(e) = {0,lf. Now CNF-Sat((/)) = 
Beatable((C7b,, . . . CoJ^e, (0, 0, ... 0, 1)). 

Next consider Check(C, a;). This is sim- 
ply -iBeatable((7, Mi,., (7(j;)). Even when re- 
stricted to calls of this form, Beatable remains 
just as hard. To show this, we tweak the above 
construction so we can write C{x) (for some x) 
in place of (0, 0, ... 0, 1). Add the new element 
e to Gen(e), and extend the constraint defini- 
tions by putting C/j. (e) = iff i < m. Then 
CNF-Sat(0) = BEATABLE((7,e,C(e)). There- 
fore Check = -iBeatable is coNP-complete. 

Next we consider Best(C, u, A;). This prob- 
lem is in DP for the same simple reason that 
the question TspVal(G) = A; is (see above). 
If we do not make the finite-state assump- 
tions, it is also D^-hard by reduction from the 
L'P-complete language Sat-Unsat = {((/>, "0) : 
4> G Sat, 1/; ^ Sat} ( Papadimitriou and Yam] 
nakakis, 1982|) , as follows: Sat-Unsat(0, -0) = 
Best((C0, C^), e, (0, 1)), renaming variables as 
necessary so that uses only f 1 , . . . and ■0 
uses only fr+i, . . . fs, and Gen(e) = {0, 1}''"'"'^. 



^^Krentel's presentation is actually in terms of Maxi- 
mum Satisfying Assignment, which merely reverses the 
roles of and 1. Also, Krentel does not mention that 
4> can be restricted to CNF, but importantly for us, his 
proof of OptP-hardness makes this fact clear. 



It is not clear whether Best remains D^'-hard 
under the finite-state assumptions. But con- 
sider a more flexible variant Range(C, li, fci, 
that asks whether OptVal(C, u) is between 
ki and k2 inclusive. This is also in D^, 
and is D^-hwA because SAT-UNSAT((/)#'i/;) = 
Range((Cd,,...Cd„,C^/,...C7o/ ), e, (0, . . . 

0, 0, ... 1), (0, ... 0, 1, ... 1), where 4>, ip, Gen are 
as before and (j) = AI^i iIj = AI^i D[. 

Finally, we show that the decision problems 
CheckSSet and OptValZ are A^-complete. 



^^Cu'. requires a DFA of 2r + 2 states. Remark: With- 
out the finite-state assumptions, we could just write 



Msa((^) = OptVal({C^, 



. C^A^v^), e) for any (j). 
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They are in A2 by an algorithm similar to 
the one used for TSP uniqueness above: since 
Beatable can be determined by an NP oracle, 
we can find OptVal((7, u) by binary search .[^ 
An additional call to an NP oracle decides 
CheckSSet((7, X) by asking whether 3x e X 
such that C{x) = OptVal(C, u). Such a call 
also trivially decides OptValZ. 

The reduction to show Ag-hardness is from 
a^A^- complete problem exhibited by Krentel 



theorem 3.4): MsA^s^ accepts (f) iff 
the final (least significant) bit of Msa((/)) 
is 0. Given (p, we use the same gram- 
mar as when we reduced MsA to OptVal: 
since MsA and OptVal then share the same 
last bit, MsAi5fe(0) = OPTVALZ(C,e) = 
CheckSSet(C, {0, l}™+^-iO). 

Note that we did not have to use an un- 
natural attested surface set as in S5.1. The 



set {0, ly^+^'-iQ means that the learner has 
observed only certain bits of the utterance — 
exactly the kind of partial observation that we 
expect. So even some restriction to "reason- 
able" attested sets is unlikely to help. 

7 Complexity of OT Ranking 

We now consider two ranking problems. These 
ask whether C can be ranked in a manner con- 
sistent with attested forms or attested sets: 

• Rankable(C, {xi, . . . Xm}): returns "yes" 
iff there is a ranking C of C such that 
Check(C, Xj) for all i. 

• RankableSSet(C, {Xj, . . . Xm}): returns 
"yes" iff there is a ranking C of C such that 
CHECKSSET(C,Xi) for ah i. 

We do not have an exact classification of 
Rankable at this time. But interestingly, 
the special case where m = 1 and the con- 
straints take values in {0, 1} (which has suf- 
ficed to show most of our hardness results) 
is only coNP-complete — the same as Check, 
which merely verifies a solution. Why? Here 
Rankable need only ask whether there exists 
any y G Gen(u^j) that satisfies a proper super- 
set of the constraints that xi satisfies. For if so, 
xi cannot be optimal under any ranking, and 



^^This takes polynomially many steps provided that 
logCi(2;) is polynomial in \x\ (as it is under the finite- 
state assumptions). We've already assumed that |a::| itself 
is polynomial on the input size, at least for optimal x. 



if not, then we can simply rank the constraints 
that xi satisfies above the others. This immedi- 
ately implies that the special case is in coNP. It 
also implies it is coNP-hard: using the grammar 
from our proof that Check is coNP-hard (§|6.3|), 
we write CNF-Sat((/)) = -■Rankable(C, {e}). 

The RcdAll algorithm of §^ provides an up- 
per bound on the complexity of Rankable. We 
saw in |0 that RcdAll can decide Rankable 
with 0{ri?m) calls to Opt (where n = |C|). 
In fact, it suffices to call Check rather than 
Opt (since RcdAll only tests whether Xi G 
Opt(---)). Since Check e coNP, it follows 
that Rankable is in P^°^^ = P^^ = Af Q 

RankableSSet is certainly in Sg, since it 
may be phrased in 3V form as (3C, {xj € Xi\) 
{\/i,yi G Gen{uxi)) C{xi) < C{yi). We saw in ^ 
that it is N P-hard even when the constraints in- 
teract simply. One suspects it is Ag-hard, since 
merely verifying a solution (i.e., CheckSSet) 
is Ag-complete (§6.3). We now show that is ac- 
tually E2-hard and therefore Sg-complete. 

The proof is by reduction from the canonical 
$]2-complete problem QSat2((/>, r), where = 
A[!Li A is a CNF formula with £((/>) > r > 0. 
This returns "yes" iff 
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6r-'36r+i, • • • bs<p{bi,. . . bs), 



def 



where s l{(f)) and 0(6i,...6s) denotes the 
truth value of (j) when the variables vi, . . .Vs are 
bound to the respective binary values h\. . .hg- 
Given an instance of QSAT2 as above, put 
C = {e} and Gen(e) = {0,1}''+" UX where 
X = the set {0,1}^'2. Let C = {Cd„---Cd^, 
Ct,i , . . . Cy^ , C^y^ ,.. . C^y^ ,X}, wlieie all con- 
straints have range {0, 1}, we extend Cd^ over 
X by defining it to be satisfied (i.e., take value 
0) on all candidates in X, and we define X to 
be satisfied on exactly those candidates not in 
X. As before, Cy and C-,^- are satisfied on a 



^^Tesar's EDCD and MRCD algorithms (^) also 
run in polytime given an NP oracle. They too decide 
Rankable with polynomially many calls to Opt. While 
they cannot substitute Check for Opt, they can substi- 
tute OptVal (since they need optimal y only to com- 
pute C{y)). Each call to OptVal e FP'^'' can then be 
replaced by polynomially many calls to Check G coNP. 

It is not relevant to RcdAll vs. EDCD that calling 
Check once (coNP-complete) is in an easier complexity 
class than calling OptVal once (FP^'^-complete). Nor 
is it relevant for any practical purpose, since these two 
classes collapse under Turing (Cook) reductions. 
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candidate iff its bit is 1 or respectively, 
regardless of whether the candidate is in X. 

We now claim that QSat2(0, r) = 
RankableSSet(C,{X}). The following 
terminology will be useful in proving this: 
Given a bit sequence h = bi, . . .br, define a 
6-satisfier to be a bit string bi ■ ■ ■ brbr+i ■ ■ - bg 
such that (/>(6i, . . . bs). For 1 < i < r, let Bi, Bi 
denote the constraints C^,. , C-,^- respectively if 
bi = 1, or vice- versa if bi = 0. We then say 
that a ranking C of C is 6-compatibIe if Bi 
precedes Bi in C for every 1 < i < r. 

First observe that a candidate y G Gen(e) is a 
6-satisfier iff it satisfies the constraints Bi, . . . Br 
and Cdi, ■ ■ ■ and X. From this it is not dif- 
ficult to see that if C is a 6-compatible ranking, 
then y beats x (i.e., C{y) < C{x)) for any b- 
satisfier y and any x € Now for the proof: 

Suppose RankableSSet(C, {X}). Then 
choose X X and C a ranking of C such 
that X is optimal (i.e., Check(C, x)). For each 
1 < i < r, let 6j = 1 if C^. is ranked before 
in C, otherwise bi — 0. Then C is a 6- 
compatible ranking. Since x (z X is optimal, 
there must be no 5-satisfiers y, i.e., QSat2(0, r). 

Conversely, suppose QSAT2{(p,r). This 
means we can choose bi, . . .br such that there 
are no 6-satisfiers. Let C = (Cdi, . . . C^^, 
Bi, . . . Br, Bi,...Br,X). Observe that x = 
bi ■ ■ ■br2 G X satisfies the first m -|- r of the 
constraints; this is optimal (i.e., Check(C, x)), 
since any better candidate would have to be a 
6-satisfier.P^ Hence there is a ranking C consis- 
tent with X, i.e., RankableSSet(C, {X}). 

8 Optimization vs. Derivation 

The above results mean that OT generation and 
ranking are hard. We will now see that they are 
harder than the corresponding problems in de- 
terministic derivational theories, assuming that 
the complexity classes discussed are distinct. 

A derivational grammar consists of the fol- 
lowing elements (cf. §^): 

• an alphabet S; 



• a set £ C S* of underlying forms; 

• a vector R = {Ri, . . . Rn) of rules, each of 
which is a function from E* to S*. 

The grammar maps each x £ C to R{x) "== 
RnO - ■ ■oR2oRi(^x). If all the rules are polytime- 
computable (i.e., in the function class FP), then 
so is R. (By contrast, the OT analogue Opt 
is complete for the function class FP'^^.) It fol- 
lows that the derivational analogues of the de- 
cision problems given at the start of §^ are in 
P0 (whereas we have seen that the OT versions 
range from NP-complete to A2-complete) . 

How about learning? The rule ordering 
problem OrderableSSet takes as input a set 
TZ of possible rules, a unary integer n, and a set 
of pairs {{ui, Xi), . . . (um, Xm)} where Ui G S* 
and Xi QTi*. It returns "yes" iff there is a a rule 
sequence R S TZ^ such that (\/i)R{ui) € Xi. It 
is clear that this problem is in NP. This makes 
it easier than its OT analogue RankableSSet 
and possibly easier than Rankable. 

For interest, we show that OrderableSSet 
is NP-complete, as is its restricted version 
Orderable (where the attested sets Xi are 
replaced by attested forms Xi). As usual, our 
result holds even with finite-state restrictions: 
we can require the rules in TZ to be regular 
relations ( Johnson, 1972| ). The hardness proof 
is by reduction from Hamilton Path (defined in 



^^y satisfies X while x doesn't, so C{y) 7^ C{x). And 
C{y) > C{x) is impossible, for if x satisfies any con- 
straint that y violates, namely some Bi, then it violates 
a higher-ranked constraint that y satisfies, namely Bi. 

Since it would have to satisfy the first m + r con- 
straints plus a later constraint, which could only be X. 



§5.1). Given a directed graph G with vertices 
l,2,...n, put S = {#,0, 1,2, ...n}. Each 
string we consider will be either e or a permuta- 
tion of S. Define MovEj to be a rule that maps 
aj/3#7z to a/3#7zj for any i, j G S, a, /3, 7 € S* 
such that i = OT else G has an edge from i 
to j, and acts as the identity function on other 
strings. Also define Accept to be a rule that 
maps i^a to e for any a G S*, and acts as 
the identity function on other strings. Now 
Orderable({Movei, . . . Move„, Accept}, n+ 
1, {(12 • • • n^^O, e)}) decides whether G has a 
Hamilton path. 



9 Conclusions 

See the abstract for our most important results. 
Our main conclusion is a warning that OT car- 



^°However, Wareham (1998) analyzes a more power- 
ful derivational approach where the rules are nondeter- 
ministic: each Ri is a relation rather than a function. 
Wareham shows that generation in this case is NP-hard 
(Theorem 4.3.3.1). He does not consider learning. 



32 



ries large computational burdens. When formu- 
lating the OT learning problem, even small nods 
in the direction of realism quickly drive the com- 
plexity from linear-time up through coNP (for 
multiple competitors) into the higher complex- 
ity classes (for multiple possible surface forms). 

Hence all OT generation and learning algo- 
rithms should be suspect. Either they oversim- 
plify their problem, or they sometimes fail, or 
they take worse than polynomial time on some 
class of inputs. (Or they demonstrate P = NP!) 

One constraint ranking problem we consider, 
RankableSSet, is in fact a rare "natural" ex- 
ample of a problem that is complete for the 
higher complexity class Eg ("3V"). Intuitively, 
an OT learner must both pick a constraint 
ranking (3) and check that an attested form 
beats or ties all competitors under that ranking 
(V) . Some other learning problems were already 



known to be Sg-complete (Ko and Tzeng, 1991), 
but ours differs in that the input has no negative 
exemplars (not even implicit ones, given ties). 

This paper leaves some theoretical questions 
open. Most important is the exact classification 
of Rankable. Second, we are interested in any 
cases where problem variants (e.g., accepting vs. 
rejecting the finite-state assumptions) differ in 
complexity. Third, in the same spirit, param- 



eterized complexity analyses ( Wareham, 1998 ) 
may help further identify sources of hardness. 

We are also interested in more realistic ver- 
sions of the phonology learning problem. We 
are especially interested in the possibility that 
C has internal structure, as discussed in footnote 
^, and in the problem of learning from general 
attested sets, not just attested surface sets. 

Finally, in light of our demonstrations that 
efficient algorithms are highly unlikely for the 
problems we have considered, we ask: Are there 
restrictions, reformulations, or randomized or 
approximate methods that could provably make 
OT learning practical in some sense? 
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